Introduction

mgwrsar package estimates linear and local linear models with or without spatial autocorrelation and multiscale GWR. However this vignette is mainly dedicated to canonical GWR and mixed GWR without spatial autocorrelation (see other vignettes for models with spatial autocorrelation or with Top Down Scale approach for multiscale GWR):

\(y=\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (GWR)\)

\(y=\beta_c X_c+\beta_v(u_i,v_i) X_v+\,\epsilon_i\;\;\;\;\;\;\;\; (Mixed-GWR)\)

Estimation and Model Specifications

For a comprehensive understanding of the estimation method with spatial autocorrelation, please refer to:
Geniaux, G. and Martinetti, D. (2018). A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics.
DOI: 10.1016/j.regsciurbeco.2017.04.001

MGWRSAR Wrapper Function

The estimation of the aforementioned model can be performed using the MGWRSAR wrapper function, which requires:
- A dataframe and a matrix of coordinates, or
- Objects such as SpatialPointsDataFrame, SpatialGridDataFrame, SpatialPixelsDataFrame, or an sf dataframe.

Key Features and Notes

  1. Optimal Bandwidth Estimation:
    • The golden_search_bandwidth function can be used to estimate optimal bandwidths, considering either AICc or Leave-One-Out Cross Validation (LOOCV).
    • This applies to all models, including those with or without spatial autocorrelation.
  2. Kernel Options:
    • Available kernels include Gaussian (gauss), Epanechnikov (epane), Bisquare (bisq), Trisquare (trisq), Triangle (triangle), and Rectangle (rectangle).
    • Kernels can be applied to space coordinates only (Type='GD') or space-time coordinates (Type='GDT').
    • Both adaptive (bandwidth as a number of neighbors) and non-adaptive (bandwidth as a distance) kernels are supported.
  3. Prediction Modes:
    • Three prediction modes are available for spatially varying coefficients models.
    • Refer to Geniaux (2024), “Speeding up estimation of spatially varying coefficients models” Journal of Geographycal Systems.
  4. Spatial Autocorrelation Predictions:
    • Predictions for models incorporating spatial autocorrelation can be conducted using the Best Linear Unbiased Prediction (BLUP) method proposed by Goulard et al. (2017).
    • For details, see the vignette: GWR-and-Mixed-GWR-with-spatial-autocorrelation.
  5. Model Testing:
    • Model specifications and spatial stationarity of coefficients can be tested using bootstrap methods.
  6. Temporal GWR/Mixed-GWR Models:
    • The mgwrsar package supports the estimation of temporal GWR/Mixed-GWR models using the Type='GDT' kernel.
    • It accommodates space-time kernels, including asymmetric time kernels and asymmetric time cycling kernels, with diagnostics (e.g., AICc) for determining optimal bandwidths.
    • For more details, refer to the vignette: GWR/Mixed-GWR-with-space-time-kernel.
  7. Large Sample Size Optimization:
    • For large datasets, the package employs RCCP, RCCPArmadillo and RCCPeigen code to accelerate computations.
    • It also supports parallel computing using the doParallel and future package.
    • All mgwrsar models can be estimated using a Target Points set, with a method for selecting an optimal set of target points (based on a specified size) to enhance computational efficiency.
    • For more information, see the vignette: Speeding_up_GWR_like_models.

Using the mgwrsar Package

The MGWRSAR function requires either:

  • A dataframe and a matrix of coordinates, or
  • An sf dataframe with embedded coordinates.

For this example, we use the mydatasf dataset, which contains real estate data (single houses with land plots) from the south of France. The dataset includes the price and four covariates:

  • price: Price in euros.
  • year: Year of sale.
  • footage: Total living area of the house.
  • land_area: Area of the land plot.
  • pbb45: Percentage of houses built before 1945 within a one-square-kilometer area (INSEE km grid, RGP 2010-2020).

Data example

library(mgwrsar)
#> Loading required package: sp
#> Loading required package: Matrix
#> Registered S3 method overwritten by 'future':
#>   method               from      
#>   all.equal.connection parallelly
library(sf)
#> Warning: package 'sf' was built under R version 4.3.3
#> Linking to GEOS 3.13.0, GDAL 3.8.5, PROJ 9.5.1; sf_use_s2() is TRUE
## loading data example
data(mydatasf)

p<-plot(mydatasf['price'],pch=19,cex=0.6)


mydata<-st_drop_geometry(mydatasf)
coords<-as.matrix(st_coordinates(mydatasf))
mycrs=st_crs(mydatasf)

Bandwidth optimization and Estimation for GWR

Estimating a canonical GWR or Mixed-GWR model requires selecting an optimal bandwidth for the chosen kernel. The mgwrsar package provides a wrapper that simplifies the process of identifying the optimal bandwidth across various model and kernel types, using either AICc or cross-validation (CV) criteria. While primarily designed for spatial kernels (Type=‘GD’), it can also be applied to space-time kernels (Type=‘GDT’) by specifying a fixed bandwidth for the temporal dimension.

The following example demonstrates how to search for the optimal bandwidth for GWR and MGWR (with a stationary intercept) models, using both AICc and CV criteria for adaptive and non-adaptive Gaussian kernels.

Bandwidth optimization and model estimation


#### 
res_AIC_non_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('bisq'), Model = 'GWR',control=list(adaptive=F,criterion='AICc',verbose=F),lower.bound=100, upper.bound=50000,tolerance=0.001,show_progress = FALSE)
res_AIC_non_adpative$minimum 
#> [1] 666.0672
res_AIC_non_adpative$ctime 
#> elapsed 
#>  10.193

model_GWR1<-MGWRSAR(formula='price~year+footage+land_area',H=res_AIC_non_adpative$minimum,data = mydata,coords=coords, fixed_vars=NULL,kernels=c('bisq'),Model='GWR',control=list(adaptive=F,verbose=F,SE=T))
model_GWR1@ctime 
#> elapsed 
#>   0.357
summary(model_GWR1)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata, 
#>     coords = coords, fixed_vars = NULL, kernels = c("bisq"), 
#>     H = res_AIC_non_adpative$minimum, Model = "GWR", control = list(adaptive = F, 
#>         verbose = F, SE = T))
#> 
#> Model: GWR 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : bisq (Fixed / Distance) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 666.07 
#> ------------------------------------------------------
#> Model Settings:
#>    Computation time: 0.357 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: NO 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1403 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept             year           footage        land_area      
#>  Min.   :-1240468   Min.   :-52585   Min.   :-6151   Min.   :-487.32  
#>  1st Qu.:    4547   1st Qu.:  1773   1st Qu.: 1366   1st Qu.:  28.63  
#>  Median :   42360   Median :  3500   Median : 1756   Median :  40.23  
#>  Mean   :   19258   Mean   :  4446   Mean   : 1920   Mean   :  47.27  
#>  3rd Qu.:   62809   3rd Qu.:  6240   3rd Qu.: 2214   3rd Qu.:  48.22  
#>  Max.   : 2088953   Max.   : 96935   Max.   : 7449   Max.   : 412.79  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 1222.9 
#>    AICc: 36593.88 
#>    Residual sum of squares: 1.299298e+13 
#>    RMSE: 96233.34 
#> ------------------------------------------------------

res_AIC_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='AICc',verbose=F,NN=500),lower.bound=5, upper.bound=500,tolerance=0.001,show_progress = FALSE)
res_AIC_adpative$minimum 
#> [1] 12
res_AIC_adpative$ctime 
#> elapsed 
#>   2.449

model_GWR1<-res_AIC_adpative$model
summary(res_AIC_adpative$model)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = formula, data = data, coords = coords, fixed_vars = fixed_vars, 
#>     kernels = kernels, H = c(res, Ht), Model = Model, control = control_o)
#> 
#> Model: GWR 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 12 
#> ------------------------------------------------------
#> Model Settings:
#>    Computation time: 0.063 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: YES (500 neighbors) 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1403 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept            year           footage          land_area      
#>  Min.   :-534813   Min.   :-10861   Min.   :  20.66   Min.   : -26.13  
#>  1st Qu.: -23587   1st Qu.:  1559   1st Qu.:1216.03   1st Qu.:  21.50  
#>  Median :  41463   Median :  4036   Median :1707.25   Median :  39.23  
#>  Mean   :  20557   Mean   :  4614   Mean   :1855.47   Mean   :  87.32  
#>  3rd Qu.:  80401   3rd Qu.:  6700   3rd Qu.:2340.79   3rd Qu.:  65.92  
#>  Max.   : 335102   Max.   : 28532   Max.   :4797.96   Max.   :1695.35  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 1153.83 
#>    AICc: 36676.68 
#>    Residual sum of squares: 1.201235e+13 
#>    RMSE: 92530.54 
#> ------------------------------------------------------

Bandwidth optimization with a non adpative gaussian kernel using CV

## optimal bandwidth using CV criteria
# optimization

resGWR_CV_non_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=F,criterion='CV',verbose=F,NN=500),lower.bound=5, upper.bound=50000,tolerance=0.001,show_progress = FALSE)
resGWR_CV_non_adpative$minimum 
#> [1] 449.1068
resGWR_CV_non_adpative$ctime
#> elapsed 
#>   6.113
model_GWR1<-resGWR_CV_non_adpative$model

Summary and plot sing leaflet of GWR likes models

# summary of optimal model
summary(model_GWR1)
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = formula, data = data, coords = coords, fixed_vars = fixed_vars, 
#>     kernels = kernels, H = c(res, Ht), Model = Model, control = control_o)
#> 
#> Model: GWR 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Fixed / Distance) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 449.11 
#> ------------------------------------------------------
#> Model Settings:
#>    Computation time: 0.064 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: YES (500 neighbors) 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1403 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept            year          footage         land_area     
#>  Min.   :-487887   Min.   :-6489   Min.   :-451.1   Min.   :-60.28  
#>  1st Qu.:  -9007   1st Qu.: 2338   1st Qu.:1582.8   1st Qu.: 36.35  
#>  Median :  34570   Median : 3319   Median :1779.2   Median : 39.74  
#>  Mean   :  10748   Mean   : 4600   Mean   :1984.2   Mean   : 44.41  
#>  3rd Qu.:  52285   3rd Qu.: 5860   3rd Qu.:2277.2   3rd Qu.: 42.35  
#>  Max.   : 259204   Max.   :23427   Max.   :4645.1   Max.   :185.64  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 1303.75 
#>    AICc: 36722.03 
#>    Residual sum of squares: 1.641609e+13 
#>    RMSE: 108169.8 
#> ------------------------------------------------------

p<-plot(model_GWR1,type='B_coef',var='Intercept')
#> Warning in plot.mgwrsar(x = x, type = type, var = var, crs = crs, mypalette =
#> mypalette, : n_time_steps ignored (Model is not Spatio-Temporal GDT).
# Size Optimisation for the vignette
p<-plotly::partial_bundle(p)
p
p<-plot(model_GWR1,type='B_coef',var='footage')
#> Warning in plot.mgwrsar(x = x, type = type, var = var, crs = crs, mypalette =
#> mypalette, : n_time_steps ignored (Model is not Spatio-Temporal GDT).
# Size Optimisation for the vignette
p<-plotly::partial_bundle(p)
p

Estimation of the GWR model with Standard Error Computation

## if we want the computation of Standard Error of coefficients, we can restimate the model using  (SE=T) in control.

model_GWR1<-MGWRSAR(formula = 'price~year+footage+land_area', data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'),H=200, Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500,SE=T))

summary(model_GWR1) 
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = "price~year+footage+land_area", data = mydata, 
#>     coords = coords, fixed_vars = NULL, kernels = c("gauss"), 
#>     H = 200, Model = "GWR", control = list(adaptive = T, criterion = "CV", 
#>         verbose = F, NN = 500, SE = T))
#> 
#> Model: GWR 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 200 
#> ------------------------------------------------------
#> Model Settings:
#>    Computation time: 0.122 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: YES (500 neighbors) 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1403 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Varying Parameters]
#>    Intercept           year         footage       land_area    
#>  Min.   :-27007   Min.   :1860   Min.   :1363   Min.   :28.33  
#>  1st Qu.: -2262   1st Qu.:2350   1st Qu.:1676   1st Qu.:38.74  
#>  Median : 23470   Median :3067   Median :1841   Median :40.56  
#>  Mean   : 20279   Mean   :3923   Mean   :1923   Mean   :41.04  
#>  3rd Qu.: 39546   3rd Qu.:5357   3rd Qu.:2113   3rd Qu.:43.59  
#>  Max.   : 60601   Max.   :8318   Max.   :2898   Max.   :54.15  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 1379.81 
#>    AICc: 36960.89 
#>    Residual sum of squares: 2.191749e+13 
#>    RMSE: 124987.5 
#> ------------------------------------------------------

Plot of the effects of spatially varying coefficients:

# the effect of variable i for point (u_i,v_i) corresponds to x_i(u_i,v_i) * \beta_i(u_i,v_i)
plot_effect(model_GWR1,title='Effects')

Bandwidth optimization and Estimation of a mixed GWR model (year AS FIXED VARS) with an adpative bisquare kernel.

## optimal bandwidth using AICc criteria
resMGWR_CV_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',Ht=NULL,data = mydata,coords=coords, fixed_vars='year',kernels=c('gauss'), Model = 'MGWR',control=list(verbose=F,NN=500,criterion='CV',adaptive=T),lower.bound=6, upper.bound=502,tolerance=0.001,show_progress = FALSE)
resMGWR_CV_adpative$minimum # 8
#> [1] 6
model_MGWR<-resMGWR_CV_adpative$model
## summary of optimal model
summary(model_MGWR) 
#> ------------------------------------------------------
#> Call:
#> MGWRSAR(formula = formula, data = data, coords = coords, fixed_vars = fixed_vars, 
#>     kernels = kernels, H = c(res, Ht), Model = Model, control = control_o)
#> 
#> Model: MGWR 
#> ------------------------------------------------------
#> Kernels Configuration:
#>    Spatial Kernel  : gauss (Adaptive / Neighbors) 
#> ------------------------------------------------------
#> Bandwidth Configuration:
#>    Spatial Bandwidth (H): 6 
#> ------------------------------------------------------
#> Model Settings:
#>    Computation time: 0.054 sec
#>    Use of Target Points: NO 
#>    Use of rough kernel approximation: YES (500 neighbors) 
#>    Parallel computing: FALSE (ncore = 1) 
#>    Number of data points: 1403 
#> ------------------------------------------------------
#> Coefficients Summary:
#>    [Constant Parameters]
#>     year 
#> 5041.442 
#> 
#>    [Varying Parameters]
#>    Intercept          footage        land_area       
#>  Min.   :-971161   Min.   :-4189   Min.   :-1203.94  
#>  1st Qu.: -33476   1st Qu.: 1115   1st Qu.:   10.13  
#>  Median :  22935   Median : 1756   Median :   36.41  
#>  Mean   :  18676   Mean   : 1837   Mean   :  113.48  
#>  3rd Qu.:  85216   3rd Qu.: 2516   3rd Qu.:   79.18  
#>  Max.   : 584804   Max.   : 6300   Max.   : 3162.55  
#> ------------------------------------------------------
#> Diagnostics:
#>    Effective degrees of freedom: 1030.12 
#>    AICc: 36693.44 
#>    Residual sum of squares: 9.073811e+12 
#>    RMSE: 80420.36 
#> ------------------------------------------------------

Bootstrap tests

The mgwrsar_bootstrap_test function performs a bootstrap test for Beta coefficients in mgwrsar class models, with support for parallel computation. As this process can be time-consuming, it is not executed in this example.

D=as.matrix(dist(coords))

#  Stationnarity Test should be conducted using optimal bandwidth

resGWR=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500),lower.bound=5, upper.bound=502,tolerance=0.001,show_progress = FALSE)
model_GWR<-resGWR$model

resMGWR=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata,coords=coords, fixed_vars='year',kernels=c('gauss'), Model = 'MGWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500),lower.bound=5, upper.bound=502,tolerance=0.001,show_progress = FALSE)
model_MGWR<-resMGWR$model

model_GWR@AICc
model_MGWR@AICc
model_GWR@RMSE
model_MGWR@RMSE

test<-mgwrsar_bootstrap_test(model_MGWR,model_GWR,B=30,type='spatial',eps='H1',df='H1',focal='median',D=D)
test
#$p_star   --> we can not reject HO --> year is not varying over space

# Significance Test of covariate "year"
resGWRc=golden_search_bandwidth(formula='price~footage+land_area',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500),lower.bound=5, upper.bound=502,tolerance=0.001,show_progress = FALSE)
model_GWRc<-resGWRc$model

model_GWRc@AICc
model_GWR@AICc
model_GWRc@RMSE 
model_GWR@RMSE

test<-mgwrsar_bootstrap_test(model_GWRc,model_GWR,B=30,doMC=F,ncore=1,type='spatial',eps='H1',df='H1',focal='median',D=D)
test
#$p_star --> we can reject HO --> "year" is significant

# Significance Test of covariate "random"
set.seed=1234
mydata$random=rnorm(nrow(coords))

resGWRnc=golden_search_bandwidth(formula='price~year+footage+land_area+random',data = mydata,coords=coords, fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F,NN=500),lower.bound=5, upper.bound=502,tolerance=0.001,show_progress = FALSE)
model_GWRnc<-resGWRnc$model

model_GWRnc@AICc
model_GWR@AICc
model_GWRnc@RMSE
model_GWR@RMSE

test<-mgwrsar_bootstrap_test(model_GWR,model_GWRnc,B=30,type='spatial',eps='H1',df='H1',focal='median',D=D)
test
#$p_star  --> we can not reject HO --> "random" is not significant

Prediction

In this example, we use an in-sample of 800 observations for model estimation and an out-sample of 200 observations for prediction. Predictions for GWR and MGWR can be made either by extrapolation (faster) or by re-estimating the model using out-sample locations as focal points (more accurate).

For GWR and Mixed-GWR models, the most accurate approach for out-of-sample prediction is to re-estimate the model coefficients using out-sample locations as focal points. In this case, the estimated coefficients themselves are not used; only the model call and input data are employed. Alternatively, coefficients can be extrapolated using a weighting matrix, which is the only available method for models incorporating spatial autocorrelation (e.g., MGWR_1_0_kv, MGXWR_0_0_kv, MGXWR_1_kc_kv, MGXWR_1_kc_0) (see the vignette GWR-and-Mixed-GWR-with-spatial-autocorrelation.Rmd for details). Users can then also choose to extrapolate model coefficients using a Shepard kernel with a specified number of neighbors (method_pred=‘shepard’, k_extra=12), apply the same kernel and bandwidth as the estimated model (method_pred=‘model’), or use the method proposed by Geniaux (2024) (method_pred=‘tWtp_model’), which is the default when re-estimating model coefficients using out-sample locations as focal points is not possible.

set.seed(123, kind = "L'Ecuyer-CMRG", normal.kind = "Inversion")
# build insample and outsample folds
length_out=1200
index_in=sample(1:nrow(mydata),length_out)
index_out=(1:nrow(mydata))[-index_in]

# estimate a GWR model on an insample fold

res_AIC_non_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata[index_in,],coords=coords[index_in,], fixed_vars=NULL,kernels=c('gauss'), Model = 'GWR',control=list(adaptive=T,criterion='CV',verbose=F),lower.bound=5, upper.bound=length(index_in),tolerance=0.001,show_progress = FALSE)
res_AIC_non_adpative$minimum
#> [1] 35

model_GWR_insample1<-res_AIC_non_adpative$model

# build outsample data
  newdata=mydata[index_out,]
  newdata_coords=coords[index_out,]
  
## prediction of the GWR model on the newdata  
  
## restimate the model coefficient using locations of out-sample as focal points 
  Y_pred1=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords)
  head(Y_pred1)
#> [1] 158880.1 145000.0 356006.2 132244.9 425365.1 138600.3
  head(mydata$price[index_out])
#> [1] 305000 244600 340000 123000 379000 200000
  sqrt(mean((mydata$price[index_out]-Y_pred1)^2)) 
#> [1] 267882.6

## Prediction with method_pred='tWtp' 
Y_pred2=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred2)
#> [1] 328070.8 289312.4 265561.5 131717.3 255524.8 147338.9
head(mydata$price[index_out])
#> [1] 305000 244600 340000 123000 379000 200000
sqrt(mean((mydata$price[index_out]-Y_pred2)^2)) 
#> [1] 119980.7

## Prediction with method_pred='model' 
Y_pred3=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords,method_pred='model')
head(Y_pred3)
#> [1] 328799.9 423345.5 266921.5 133010.7 285719.2 148010.1
head(mydata$price[index_out])
#> [1] 305000 244600 340000 123000 379000 200000
sqrt(mean((mydata$price[index_out]-Y_pred3)^2)) 
#> [1] 158846


## Prediction with method_pred='shepard' 
Y_pred4=predict(model_GWR_insample1, newdata=newdata, newdata_coords=newdata_coords,method_pred='shepard')
head(Y_pred4)
#> [1] 326861.1 295061.4 266580.6 128928.5 243170.8 141284.4
head(mydata$price[index_out])
#> [1] 305000 244600 340000 123000 379000 200000
sqrt(mean((mydata$price[index_out]-Y_pred4)^2)) 
#> [1] 122040.8


### Model Mixed-GWR (wrong model...)

resMGWR_CV_adpative=golden_search_bandwidth(formula='price~year+footage+land_area',data = mydata[index_in,],coords=coords[index_in,], fixed_vars='year',kernels=c('gauss'), Model = 'MGWR',control=list(adaptive=T,criterion='CV',verbose=F),lower.bound=5, upper.bound=length(index_in),tolerance=0.001,show_progress = FALSE)
res_AIC_non_adpative$minimum
#> [1] 35

model_MGWR_insample2<-resMGWR_CV_adpative$model


## restimate the model coefficient using locations of out-sample as focal points 
Y_pred5=predict(model_MGWR_insample2, newdata=newdata, newdata_coords=newdata_coords)
#> Warnings: method_pred=='TP' is not implemented for  Model in ('MGWR','MGWRSAR_1_kc_0','MGWRSAR_1_kc_kv','MGWRSAR_0_kc_kv','MGWRSAR_0_0_kv'): automatic swith to method_pred='tWtp_model'
head(Y_pred5)
#> [1] 13259277.496 24074325.650  3973234.896     5347.753  6026447.883
#> [6]   345802.942
head(mydata$price[index_out])
#> [1] 305000 244600 340000 123000 379000 200000
sqrt(mean((mydata$price[index_out]-Y_pred5)^2)) 
#> [1] 8709092

## extrapolation of beta coefs with tWtp_model kernel
Y_pred6=predict(model_MGWR_insample2, newdata=newdata, newdata_coords=newdata_coords,method_pred='tWtp_model')
head(Y_pred6)
#> [1] 13259277.493 24074325.653  3973234.906     5347.752  6026447.886
#> [6]   345802.953
head(mydata$price[index_out])
#> [1] 305000 244600 340000 123000 379000 200000
sqrt(mean((mydata$price[index_out]-Y_pred6)^2)) 
#> [1] 8709278

References

Brunsdon, C., Fotheringham, A., Charlton, M., 1996. Geographically weighted regression: a method for exploring spatial nonstationarity. Geogr. Anal. 28 (4), 281–298.

Friedman, J. H. 1984. A variable span smoother. Laboratory for Computational Statistics, Department of Statistics, Stanford University: Technical Report(5).

Geniaux, G. and Martinetti, D. 2018. A new method for dealing simultaneously with spatial autocorrelation and spatial heterogeneity in regression models. Regional Science and Urban Economics, 72, 74-85. https://doi.org/10.1016/j.regsciurbeco.2017.04.001

Geniaux, G. and Martinetti, D. 2017b. R package mgwrsar: Functions for computing (mixed) geographycally weighted regression with spatial autocorrelation,. https://CRAN.R-project.org/package=mgwrsar.

Geniaux, G. 2024. Speeding up estimation of spatially varying coefficients models. Jounal of Geographical System 26, 293–327. https://doi.org/10.1007/s10109-024-00442-3

Goulard, M., Laurent, T., & Thomas-Agnan, C., 2017, “About predictions in spatial autoregressive models: Optimal and almost optimal strategies,” published in Spatial Economic Analysis, 12(2-3), 304-325

Loader, C. 1999. Local regression and likelihood, volume 47. springer New York.

McMillen, D. P. 1996. One hundred fifty years of land values in chicago: A nonparametric approach. Journal of Urban Economics, 40(1):100 – 124.